Analysis and manipulation of objects and layers in surround views

ABSTRACT

Various embodiments of the present invention relate generally to systems and methods for analyzing and manipulating images and video. According to particular embodiments, the spatial relationship between multiple images and video is analyzed together with location information data, for purposes of creating a representation referred to herein as a surround view for presentation on a device. An object included in the surround view may be manipulated along axes by manipulating the device along corresponding axes. In particular embodiments, a surround view can be separated into layers. Effects can be applied to one or more of these layers to enhance the interactive and immersive viewing experience of the surround view.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority under 35 U.S.C. §119 to ProvisionalU.S. Patent Application No. 61/903,359 (Attorney Docket No. FYSNP001P)by Holzer et al., filed on Nov. 12, 2013, titled “Systems and Methods FrProviding Surround Views,” which is incorporated by reference herein inits entirety and for all purposes.

TECHNICAL FIELD

The present disclosure relates to layers in surround views, whichincludes providing a multi-view interactive digital mediarepresentation.

DESCRIPTION OF RELATED ART

With modern computing platforms and technologies shifting towards mobileand wearable devices that include camera sensors as native acquisitioninput streams, the desire to record and preserve moments digitally in adifferent form than more traditional two-dimensional (2D) flat imagesand videos has become more apparent. Traditional digital media formatstypically limit their viewers to a passive experience. For instance, a2D flat image can be viewed from one angle and is limited to zooming inand out. Accordingly, traditional digital media formats, such as 2D flatimages, do not easily lend themselves to reproducing memories and eventswith high fidelity.

Current predictions (Ref: KPCB “Internet Trends 2012” presentation”)indicate that every several years the quantity of visual data that isbeing captured digitally online will double. As this quantity of visualdata increases, so does the need for much more comprehensive search andindexing mechanisms than ones currently available. Unfortunately,neither 2D images nor 2D videos have been designed for these purposes.Accordingly, improved mechanisms that allow users to view and indexvisual data, as well as query and quickly receive meaningful resultsfrom visual data are desirable.

OVERVIEW

Various embodiments of the present invention relate generally to systemsand methods for analyzing and manipulating images and video. Accordingto particular embodiments, the spatial relationship between multipleimages and video is analyzed together with location information data,for purposes of creating a representation referred to herein as asurround view for presentation on a device. An object included in thesurround view may be manipulated along axes by manipulating the devicealong corresponding axes. In particular embodiments, a surround view canbe separated into layers. Effects can be applied to one or more of theselayers to enhance the interactive and immersive viewing experience ofthe surround view.

BRIEF DESCRIPTION OF THE DRAWINGS

The disclosure may best be understood by reference to the followingdescription taken in conjunction with the accompanying drawings, whichillustrate particular embodiments of the present invention.

FIG. 1 illustrates an example of a surround view acquisition system.

FIG. 2 illustrates an example of a process flow for generating asurround view.

FIG. 3 illustrates one example of multiple camera views that can befused into a three-dimensional (3D) model to create an immersiveexperience.

FIG. 4A illustrates one example of separation of content and context ina surround view.

FIG. 4B illustrates one example of layering in a surround view.

FIG. 4C illustrates one example of a process for modifying a layer in asurround view.

FIGS. 5A-5B illustrate examples of concave view and convex views,respectively, where both views use a back-camera capture style.

FIGS. 6A-6E illustrate examples of various capture modes for surroundviews.

FIG. 7A illustrates one example of a process for recording data that canbe used to generate a surround view.

FIG. 7B illustrates one example of a dynamic panorama capture process.

FIG. 7C illustrates one example of a dynamic panorama capture processwhere the capture device is rotated through the axis of rotation.

FIG. 7D illustrates one example of a dynamic panorama with dynamiccontent.

FIG. 7E illustrates one example of capturing a dynamic panorama with a3D effect.

FIG. 7F illustrates one example of a dynamic panorama with parallaxeffect.

FIG. 7G illustrates one example of an object panorama capture process.

FIG. 7H illustrates one example of a background panorama with an objectpanorama projected on it.

FIG. 7I illustrates one example of multiple objects constituting anobject panorama.

FIG. 7J illustrates one example of changing the viewing angle of anobject panorama based on user navigation.

FIG. 7K illustrates one example of a selfie panorama capture process.

FIG. 7L illustrates one example of a background panorama with a selfiepanorama projected on it.

FIG. 7M illustrates one example of extended views of panoramas based onuser navigation.

FIG. 8 illustrates an example of a surround view in whichthree-dimensional content is blended with a two-dimensional panoramiccontext.

FIG. 9 illustrates one example of a space-time surround view beingsimultaneously recorded by independent observers.

FIG. 10 illustrates one example of separation of a complex surround-viewinto smaller, linear parts.

FIG. 11 illustrates one example of a combination of multiple surroundviews into a multi-surround view.

FIG. 12 illustrates one example of a process for prompting a user foradditional views of an object of interest to provide a more accuratesurround view.

FIGS. 13A-13B illustrate an example of prompting a user for additionalviews of an object to be searched.

FIG. 14 illustrates one example of a process for navigating a surroundview.

FIG. 15 illustrates an example of swipe-based navigation of a surroundview.

FIG. 16A illustrates examples of a sharing service for surround views,as shown on a mobile device and browser.

FIG. 16B illustrates examples of surround view-related notifications ona mobile device.

FIG. 17A illustrates one example of a process for providing objectsegmentation.

FIG. 17B illustrates one example of a segmented object viewed fromdifferent angles.

FIG. 18 illustrates one example of various data sources that can be usedfor surround view generation and various applications that can be usedwith a surround view.

FIG. 19 illustrates one example of a process for providing visual searchof an object, where the search query includes a surround view of theobject and the data searched includes three-dimensional models.

FIG. 20 illustrates one example of a process for providing visual searchof an object, where the search query includes a surround view of theobject and the data searched includes two-dimensional images.

FIG. 21 illustrates an example of a visual search process.

FIG. 22 illustrates an example of a process for providing visual searchof an object, where the search query includes a two-dimensional view ofthe object and the data searched includes surround view(s).

FIG. 23 illustrates a particular example of a computer system that canbe used with various embodiments of the present invention.

DETAILED DESCRIPTION

Reference will now be made in detail to some specific examples of theinvention including the best modes contemplated by the inventors forcarrying out the invention. Examples of these specific embodiments areillustrated in the accompanying drawings. While the present disclosureis described in conjunction with these specific embodiments, it will beunderstood that it is not intended to limit the invention to thedescribed embodiments. On the contrary, it is intended to coveralternatives, modifications, and equivalents as may be included withinthe spirit and scope of the invention as defined by the appended claims.

In the following description, numerous specific details are set forth inorder to provide a thorough understanding of the present invention.Particular embodiments of the present invention may be implementedwithout some or all of these specific details. In other instances, wellknown process operations have not been described in detail in order notto unnecessarily obscure the present invention.

Various aspects of the present invention relate generally to systems andmethods for analyzing the spatial relationship between multiple imagesand video together with location information data, for the purpose ofcreating a single representation, a surround view, which eliminatesredundancy in the data, and presents a user with an interactive andimmersive active viewing experience. According to various embodiments,active is described in the context of providing a user with the abilityto control the viewpoint of the visual information displayed on ascreen. In particular example embodiments, the surround view datastructure (and associated algorithms) is natively built for, but notlimited to, applications involving visual search.

According to various embodiments of the present invention, a surroundview is a multi-view interactive digital media representation. Withreference to FIG. 1, shown is one example of a surround view acquisitionsystem 100. In the present example embodiment, the surround viewacquisition system 100 is depicted in a flow sequence that can be usedto generate a surround view. According to various embodiments, the dataused to generate a surround view can come from a variety of sources. Inparticular, data such as, but not limited to two-dimensional (2D) images104 can be used to generate a surround view. These 2D images can includecolor image data streams such as multiple image sequences, video data,etc., or multiple images in any of various formats for images, dependingon the application. Another source of data that can be used to generatea surround view includes location information 106. This locationinformation 106 can be obtained from sources such as accelerometers,gyroscopes, magnetometers, GPS, WiFi, IMU-like systems (InertialMeasurement Unit systems), and the like. Yet another source of data thatcan be used to generate a surround view can include depth images 108.These depth images can include depth, 3D, or disparity image datastreams, and the like, and can be captured by devices such as, but notlimited to, stereo cameras, time-of-flight cameras, three-dimensionalcameras, and the like.

In the present example embodiment, the data can then be fused togetherat sensor fusion block 110. In some embodiments, a surround view can begenerated a combination of data that includes both 2D images 104 andlocation information 106, without any depth images 108 provided. Inother embodiments, depth images 108 and location information 106 can beused together at sensor fusion block 110. Various combinations of imagedata can be used with location information at 106, depending on theapplication and available data.

In the present example embodiment, the data that has been fused togetherat sensor fusion block 110 is then used for content modeling 112 andcontext modeling 114. As described in more detail with regard to FIG. 4,the subject matter featured in the images can be separated into contentand context. The content can be delineated as the object of interest andthe context can be delineated as the scenery surrounding the object ofinterest. According to various embodiments, the content can be athree-dimensional model, depicting an object of interest, although thecontent can be a two-dimensional image in some embodiments, as describedin more detail below with regard to FIG. 4. Furthermore, in someembodiments, the context can be a two-dimensional model depicting thescenery surrounding the object of interest. Although in many examplesthe context can provide two-dimensional views of the scenery surroundingthe object of interest, the context can also include three-dimensionalaspects in some embodiments. For instance, the context can be depictedas a “flat” image along a cylindrical “canvas,” such that the “flat”image appears on the surface of a cylinder. In addition, some examplesmay include three-dimensional context models, such as when some objectsare identified in the surrounding scenery as three-dimensional objects.According to various embodiments, the models provided by contentmodeling 112 and context modeling 114 can be generated by combining theimage and location information data, as described in more detail withregard to FIG. 3.

According to various embodiments, context and content of a surround vieware determined based on a specified object of interest. In someexamples, an object of interest is automatically chosen based onprocessing of the image and location information data. For instance, ifa dominant object is detected in a series of images, this object can beselected as the content. In other examples, a user specified target 102can be chosen, as shown in FIG. 1. It should be noted, however, that asurround view can be generated without a user specified target in someapplications.

In the present example embodiment, one or more enhancement algorithmscan be applied at enhancement algorithm(s) block 116. In particularexample embodiments, various algorithms can be employed during captureof surround view data, regardless of the type of capture mode employed.These algorithms can be used to enhance the user experience. Forinstance, automatic frame selection, stabilization, view interpolation,filters, and/or compression can be used during capture of surround viewdata. In some examples, these enhancement algorithms can be applied toimage data after acquisition of the data. In other examples, theseenhancement algorithms can be applied to image data during capture ofsurround view data.

According to particular example embodiments, automatic frame selectioncan be used to create a more enjoyable surround view. Specifically,frames are automatically selected so that the transition between themwill be smoother or more even. This automatic frame selection canincorporate blur—and overexposure—detection in some applications, aswell as more uniformly sampling poses such that they are more evenlydistributed.

In some example embodiments, stabilization can be used for a surroundview in a manner similar to that used for video. In particular,keyframes in a surround view can be stabilized for to produceimprovements such as smoother transitions, improved/enhanced focus onthe content, etc. However, unlike video, there are many additionalsources of stabilization for a surround view, such as by using IMUinformation, depth information, computer vision techniques, directselection of an area to be stabilized, face detection, and the like.

For instance, IMU information can be very helpful for stabilization. Inparticular, IMU information provides an estimate, although sometimes arough or noisy estimate, of the camera tremor that may occur duringimage capture. This estimate can be used to remove, cancel, and/orreduce the effects of such camera tremor.

In some examples, depth information, if available, can be used toprovide stabilization for a surround view. Because points of interest ina surround view are three-dimensional, rather than two-dimensional,these points of interest are more constrained and tracking/matching ofthese points is simplified as the search space reduces. Furthermore,descriptors for points of interest can use both color and depthinformation and therefore, become more discriminative. In addition,automatic or semi-automatic content selection can be easier to providewith depth information. For instance, when a user selects a particularpixel of an image, this selection can be expanded to fill the entiresurface that touches it. Furthermore, content can also be selectedautomatically by using a foreground/background differentiation based ondepth. In various examples, the content can stay relativelystable/visible even when the context changes.

According to various examples, computer vision techniques can also beused to provide stabilization for surround views. For instance,keypoints can be detected and tracked. However, in certain scenes, suchas a dynamic scene or static scene with parallax, no simple warp existsthat can stabilize everything. Consequently, there is a trade-off inwhich certain aspects of the scene receive more attention tostabilization and other aspects of the scene receive less attention.Because a surround view is often focused on a particular object ofinterest, a surround view can be content-weighted so that the object ofinterest is maximally stabilized in some examples.

Another way to improve stabilization in a surround view includes directselection of a region of a screen. For instance, if a user taps to focuson a region of a screen, then records a convex surround view, the areathat was tapped can be maximally stabilized. This allows stabilizationalgorithms to be focused on a particular area or object of interest.

In some examples, face detection can be used to provide stabilization.For instance, when recording with a front-facing camera, it is oftenlikely that the user is the object of interest in the scene. Thus, facedetection can be used to weight stabilization about that region. Whenface detection is precise enough, facial features themselves (such aseyes, nose, mouth) can be used as areas to stabilize, rather than usinggeneric keypoints.

According to various examples, view interpolation can be used to improvethe viewing experience. In particular, to avoid sudden “jumps” betweenstabilized frames, synthetic, intermediate views can be rendered on thefly. This can be informed by content-weighted keypoint tracks and IMUinformation as described above, as well as by denser pixel-to-pixelmatches. If depth information is available, fewer artifacts resultingfrom mismatched pixels may occur, thereby simplifying the process. Asdescribed above, view interpolation can be applied during capture of asurround view in some embodiments. In other embodiments, viewinterpolation can be applied during surround view generation.

In some examples, filters can also be used during capture or generationof a surround view to enhance the viewing experience. Just as manypopular photo sharing services provide aesthetic filters that can beapplied to static, two-dimensional images, aesthetic filters cansimilarly be applied to surround images. However, because a surroundview representation is more expressive than a two-dimensional image, andthree-dimensional information is available in a surround view, thesefilters can be extended to include effects that are ill-defined in twodimensional photos. For instance, in a surround view, motion blur can beadded to the background (i.e. context) while the content remains crisp.In another example, a drop-shadow can be added to the object of interestin a surround view.

In various examples, compression can also be used as an enhancementalgorithm 116. In particular, compression can be used to enhanceuser-experience by reducing data upload and download costs. Becausesurround views use spatial information, far less data can be sent for asurround view than a typical video, while maintaining desired qualitiesof the surround view. Specifically, the IMU, keypoint tracks, and userinput, combined with the view interpolation described above, can allreduce the amount of data that must be transferred to and from a deviceduring upload or download of a surround view. For instance, if an objectof interest can be properly identified, a variable compression style canbe chosen for the content and context. This variable compression stylecan include lower quality resolution for background information (i.e.context) and higher quality resolution for foreground information (i.e.content) in some examples. In such examples, the amount of datatransmitted can be reduced by sacrificing some of the context quality,while maintaining a desired level of quality for the content.

In the present embodiment, a surround view 118 is generated after anyenhancement algorithms are applied. The surround view can provide amulti-view interactive digital media representation. In variousexamples, the surround view can include three-dimensional model of thecontent and a two-dimensional model of the context. However, in someexamples, the context can represent a “flat” view of the scenery orbackground as projected along a surface, such as a cylindrical orother-shaped surface, such that the context is not purelytwo-dimensional. In yet other examples, the context can includethree-dimensional aspects.

According to various embodiments, surround views provide numerousadvantages over traditional two-dimensional images or videos. Some ofthese advantages include: the ability to cope with moving scenery, amoving acquisition device, or both; the ability to model parts of thescene in three-dimensions; the ability to remove unnecessary, redundantinformation and reduce the memory footprint of the output dataset; theability to distinguish between content and context; the ability to usethe distinction between content and context for improvements in theuser-experience; the ability to use the distinction between content andcontext for improvements in memory footprint (an example would be highquality compression of content and low quality compression of context);the ability to associate special feature descriptors with surround viewsthat allow the surround views to be indexed with a high degree ofefficiency and accuracy; and the ability of the user to interact andchange the viewpoint of the surround view. In particular exampleembodiments, the characteristics described above can be incorporatednatively in the surround view representation, and provide the capabilityfor use in various applications. For instance, surround views can beused to enhance various fields such as e-commerce, visual search, 3Dprinting, file sharing, user interaction, and entertainment.

According to various example embodiments, once a surround view 118 isgenerated, user feedback for acquisition 120 of additional image datacan be provided. In particular, if a surround view is determined to needadditional views to provide a more accurate model of the content orcontext, a user may be prompted to provide additional views. Once theseadditional views are received by the surround view acquisition system100, these additional views can be processed by the system 100 andincorporated into the surround view.

With reference to FIG. 2, shown is an example of a process flow diagramfor generating a surround view 200. In the present example, a pluralityof images is obtained at 202. According to various embodiments, theplurality of images can include two-dimensional (2D) images or datastreams. These 2D images can include location information that can beused to generate a surround view. In some embodiments, the plurality ofimages can include depth images 108, as also described above with regardto FIG. 1. The depth images can also include location information invarious examples.

According to various embodiments, the plurality of images obtained at202 can include a variety of sources and characteristics. For instance,the plurality of images can be obtained from a plurality of users. Theseimages can be a collection of images gathered from the internet fromdifferent users of the same event, such as 2D images or video obtainedat a concert, etc. In some examples, the plurality of images can includeimages with different temporal information. In particular, the imagescan be taken at different times of the same object of interest. Forinstance, multiple images of a particular statue can be obtained atdifferent times of day, different seasons, etc. In other examples, theplurality of images can represent moving objects. For instance, theimages may include an object of interest moving through scenery, such asa vehicle traveling along a road or a plane traveling through the sky.In other instances, the images may include an object of interest that isalso moving, such as a person dancing, running, twirling, etc.

In the present example embodiment, the plurality of images is fused intocontent and context models at 204. According to various embodiments, thesubject matter featured in the images can be separated into content andcontext. The content can be delineated as the object of interest and thecontext can be delineated as the scenery surrounding the object ofinterest. According to various embodiments, the content can be athree-dimensional model, depicting an object of interest, and thecontent can be a two-dimensional image in some embodiments.

According to the present example embodiment, one or more enhancementalgorithms can be applied to the content and context models at 206.These algorithms can be used to enhance the user experience. Forinstance, enhancement algorithms such as automatic frame selection,stabilization, view interpolation, filters, and/or compression can beused. In some examples, these enhancement algorithms can be applied toimage data during capture of the images. In other examples, theseenhancement algorithms can be applied to image data after acquisition ofthe data.

In the present embodiment, a surround view is generated from the contentand context models at 208. The surround view can provide a multi-viewinteractive digital media representation. In various examples, thesurround view can include a three-dimensional model of the content and atwo-dimensional model of the context. According to various embodiments,depending on the mode of capture and the viewpoints of the images, thesurround view model can include certain characteristics. For instance,some examples of different styles of surround views include a locallyconcave surround view, a locally convex surround view, and a locallyflat surround view. However, it should be noted that surround views caninclude combinations of views and characteristics, depending on theapplication.

With reference to FIG. 3, shown is one example of multiple camera viewsthat can be fused together into a three-dimensional (3D) model to createan immersive experience. According to various embodiments, multipleimages can be captured from various viewpoints and fused together toprovide a surround view. In the present example embodiment, threecameras 312, 314, and 316 are positioned at locations 322, 324, and 326,respectively, in proximity to an object of interest 308. Scenery cansurround the object of interest 308 such as object 310. Views 302, 304,and 306 from their respective cameras 312, 314, and 316 includeoverlapping subject matter. Specifically, each view 302, 304, and 306includes the object of interest 308 and varying degrees of visibility ofthe scenery surrounding the object 310. For instance, view 302 includesa view of the object of interest 308 in front of the cylinder that ispart of the scenery surrounding the object 310. View 306 shows theobject of interest 308 to one side of the cylinder, and view 304 showsthe object of interest without any view of the cylinder.

In the present example embodiment, the various views 302, 304, and 316along with their associated locations 322, 324, and 326, respectively,provide a rich source of information about object of interest 308 andthe surrounding context that can be used to produce a surround view. Forinstance, when analyzed together, the various views 302, 304, and 326provide information about different sides of the object of interest andthe relationship between the object of interest and the scenery.According to various embodiments, this information can be used to parseout the object of interest 308 into content and the scenery as thecontext. Furthermore, as also described above with regard to FIGS. 1 and2, various algorithms can be applied to images produced by theseviewpoints to create an immersive, interactive experience when viewing asurround view.

FIG. 4A illustrates one example of separation of content and context ina surround view. According to various embodiments of the presentinvention, a surround view is a multi-view interactive digital mediarepresentation of a scene 400. With reference to FIG. 4A, shown is auser 402 located in a scene 400. The user 402 is capturing images of anobject of interest, such as a statue. The images captured by the userconstitute digital visual data that can be used to generate a surroundview.

According to various embodiments of the present disclosure, the digitalvisual data included in a surround view can be, semantically and/orpractically, separated into content 404 and context 406. According toparticular embodiments, content 404 can include the object(s),person(s), or scene(s) of interest while the context 406 represents theremaining elements of the scene surrounding the content 404. In someexamples, a surround view may represent the content 404 asthree-dimensional data, and the context 406 as a two-dimensionalpanoramic background. In other examples, a surround view may representboth the content 404 and context 406 as two-dimensional panoramicscenes. In yet other examples, content 404 and context 406 may includethree-dimensional components or aspects. In particular embodiments, theway that the surround view depicts content 404 and context 406 dependson the capture mode used to acquire the images.

In some examples, such as but not limited to: recordings of objects,persons, or parts of objects or persons, where only the object, person,or parts of them are visible, recordings of large flat areas, andrecordings of scenes where the data captured appears to be at infinity(i.e., there are no subjects close to the camera), the content 404 andthe context 406 may be the same. In these examples, the surround viewproduced may have some characteristics that are similar to other typesof digital media such as panoramas. However, according to variousembodiments, surround views include additional features that distinguishthem from these existing types of digital media. For instance, asurround view can represent moving data. Additionally, a surround viewis not limited to a specific cylindrical, spherical or translationalmovement. Various motions can be used to capture image data with acamera or other capture device. Furthermore, unlike a stitched panorama,a surround view can display different sides of the same object.

Although a surround view can be separated into content and context insome applications, a surround view can also be separated into layers inother applications. With reference to FIG. 4B, shown is one example oflayering in a surround view. In this example, a layered surround view410 is segmented into different layers 418, 420, and 422. Each layer418, 420, and 422 can include an object (or a set of objects), people,dynamic scene elements, background, etc. Furthermore, each of theselayers 418, 420, and 422 can be assigned a depth.

According to various embodiments, the different layers 418, 420, and 422can be displayed in different ways. For instance, different filters(e.g. gray scale filter, blurring, etc.) can be applied to some layersbut not to others. In other examples, different layers can be moved atdifferent speeds relative to each other, such that when a user swipesthrough a surround view a better three-dimensional effect is provided.Similarly, when a user swipes along the parallax direction, the layerscan be displaced differently to provide a better three-dimensionaleffect. In addition, one or more layers can be omitted when displaying asurround view, such that unwanted objects, etc. can be removed from asurround view.

In the present example, a user 412 is shown holding a capture device414. The user 412 moves the capture device 414 along capture motion 416.When the images captured are used to generate a surround view, layers418, 420, and 422 are separated based on depth. These layers can then beprocessed or viewed differently in a surround view, depending on theapplication.

With reference to FIG. 4C, shown is one example of a process forgenerating a surround view with a modified layer in a surround view 430.In particular, a first surround view having a first layer and a secondlayer is obtained at 432. As described above with regard to FIG. 4B, asurround view can be divided into different layers. In the presentexample, the first layer includes a first depth and the second layerincludes a second depth.

Next, the first layer is selected at 434. According to various examples,selecting the first layer includes selecting data within the firstdepth. More specifically, selecting data within the first depth includesselecting the visual data located within the first depth. According tovarious embodiments, the first layer can include features such as anobject, person, dynamic scene elements, background, etc. In someexamples, selection of the first layer is performed automaticallywithout user input. In other examples, selection of the first layer isperformed semi-automatically using user-guided interaction.

After the first layer is selected, an effect is applied to the firstlayer within the first surround view to produce a modified first layerat 436. In one example, the effect applied can be a filter such as ablurring filter, gray scale filter, etc. In another example, the effectapplied can include moving the first layer at a first speed relative tothe second layer, which is moved at a second speed. When the first speedis different from the second speed, three-dimensional effects can beimproved in some instances. In some applications, a parallax effect canoccur, thereby creating a three-dimensional effect.

Next, a second surround view is generated that includes the modifiedfirst layer and the second layer at 438. As described above, applyingone or more effects to the first layer can improve the three-dimensionaleffects of a surround view in some applications. In these applications,the second surround view can have improved three-dimensional effectswhen compared to the first surround view. Other effects can be appliedin different examples, and can emphasize or deemphasize various aspectsof a first surround view to yield a second surround view. In addition,in some applications, a layer can be omitted in a second surround view.Specifically, when the first surround view includes a third layer, thesecond surround view omits this third layer. In one example, this thirdlayer could include an object or person that would be “edited out” inthe generated second surround view. In another example, this third layercould include a background or background elements, and the secondsurround view generated would not include the background or backgroundelements. Of course, any object or feature can be located in thisomitted third layer, depending on the application.

FIGS. 5A-5B illustrate examples of concave and convex views,respectively, where both views use a back-camera capture style. Inparticular, if a camera phone is used, these views use the camera on theback of the phone, facing away from the user. In particular embodiments,concave and convex views can affect how the content and context aredesignated in a surround view.

With reference to FIG. 5A, shown is one example of a concave view 500 inwhich a user is standing along a vertical axis 508. In this example, theuser is holding a camera, such that camera location 502 does not leaveaxis 508 during image capture. However, as the user pivots about axis508, the camera captures a panoramic view of the scene around the user,forming a concave view. In this embodiment, the object of interest 504and the distant scenery 506 are all viewed similarly because of the wayin which the images are captured. In this example, all objects in theconcave view appear at infinity, so the content is equal to the contextaccording to this view.

With reference to FIG. 5B, shown is one example of a convex view 520 inwhich a user changes position when capturing images of an object ofinterest 524. In this example, the user moves around the object ofinterest 524, taking pictures from different sides of the object ofinterest from camera locations 528, 530, and 532. Each of the imagesobtained includes a view of the object of interest, and a background ofthe distant scenery 526. In the present example, the object of interest524 represents the content, and the distant scenery 526 represents thecontext in this convex view.

FIGS. 6A-6E illustrate examples of various capture modes for surroundviews. Although various motions can be used to capture a surround viewand are not constrained to any particular type of motion, three generaltypes of motion can be used to capture particular features or viewsdescribed in conjunction surround views. These three types of motion,respectively, can yield a locally concave surround view, a locallyconvex surround view, and a locally flat surround view. In someexamples, a surround view can include various types of motions withinthe same surround view.

With reference to FIG. 6A, shown is an example of a back-facing, concavesurround view being captured. According to various embodiments, alocally concave surround view is one in which the viewing angles of thecamera or other capture device diverge. In one dimension this can belikened to the motion required to capture a spherical 360 panorama (purerotation), although the motion can be generalized to any curved sweepingmotion in which the view faces outward. In the present example, theexperience is that of a stationary viewer looking out at a (possiblydynamic) context.

In the present example embodiment, a user 602 is using a back-facingcamera 606 to capture images towards world 600, and away from user 602.As described in various examples, a back-facing camera refers to adevice with a camera that faces away from the user, such as the cameraon the back of a smart phone. The camera is moved in a concave motion608, such that views 604 a, 604 b, and 604 c capture various parts ofcapture area 609.

With reference to FIG. 6B, shown is an example of a back-facing, convexsurround view being captured. According to various embodiments, alocally convex surround view is one in which viewing angles convergetoward a single object of interest. In some examples, a locally convexsurround view can provide the experience of orbiting about a point, suchthat a viewer can see multiple sides of the same object. This object,which may be an “object of interest,” can be segmented from the surroundview to become the content, and any surrounding data can be segmented tobecome the context. Previous technologies fail to recognize this type ofviewing angle in the media-sharing landscape.

In the present example embodiment, a user 602 is using a back-facingcamera 614 to capture images towards world 600, and away from user 602.The camera is moved in a convex motion 610, such that views 612 a, 612b, and 612 c capture various parts of capture area 611. As describedabove, world 600 can include an object of interest in some examples, andthe convex motion 610 can orbit around this object. Views 612 a, 612 b,and 612 c can include views of different sides of this object in theseexamples.

With reference to FIG. 6C, shown is an example of a front-facing,concave surround view being captured. As described in various examples,a front-facing camera refers to a device with a camera that facestowards the user, such as the camera on the front of a smart phone. Forinstance, front-facing cameras are commonly used to take “selfies”(i.e., self-portraits of the user).

In the present example embodiment, camera 620 is facing user 602. Thecamera follows a concave motion 606 such that the views 618 a, 618 b,and 618 c diverge from each other in an angular sense. The capture area617 follows a concave shape that includes the user at a perimeter.

With reference to FIG. 6D, shown is an example of a front-facing, convexsurround view being captured. In the present example embodiment, camera626 is facing user 602. The camera follows a convex motion 622 such thatthe views 624 a, 624 b, and 624 c converge towards the user 602. Thecapture area 617 follows a concave shape that surrounds the user 602.

With reference to FIG. 6E, shown is an example of a back-facing, flatview being captured. In particular example embodiments, a locally flatsurround view is one in which the rotation of the camera is smallcompared to its translation. In a locally flat surround view, theviewing angles remain roughly parallel, and the parallax effectdominates. In this type of surround view, there can also be an “objectof interest”, but its position does not remain fixed in the differentviews. Previous technologies also fail to recognize this type of viewingangle in the media-sharing landscape.

In the present example embodiment, camera 632 is facing away from user602, and towards world 600. The camera follows a generally linear motion628 such that the capture area 629 generally follows a line. The views630 a, 630 b, and 630 c have generally parallel lines of sight. Anobject viewed in multiple views can appear to have different or shiftedbackground scenery in each view. In addition, a slightly different sideof the object may be visible in different views. Using the parallaxeffect, information about the position and characteristics of the objectcan be generated in a surround view that provides more information thanany one static image.

As described above, various modes can be used to capture images for asurround view. These modes, including locally concave, locally convex,and locally linear motions, can be used during capture of separateimages or during continuous recording of a scene. Such recording cancapture a series of images during a single session.

According to various embodiments of the present invention, a surroundview can be generated from data acquired in numerous ways. FIG. 7Aillustrates one example of process for recording data that can be usedto generate a surround view. In this example, data is acquired by movinga camera through space. In particular, a user taps a record button 702on a capture device 700 to begin recording. As movement of the capturedevice 716 follows a generally leftward direction, an object 714 movesin a generally rightward motion across the screen, as indicated bymovement of object 716. Specifically, the user presses the record button702 in view 708, and then moves the capture device leftward in view 710.As the capture device moves leftward, object 714 appears to moverightward between views 710 and 712. In some examples, when the user isfinished recording, the record button 702 can be tapped again. In otherexamples, the user can tap and hold the record button during recording,and release to stop recording. In the present embodiment, the recordingcaptures a series of images that can be used to generate a surroundview.

According to various embodiments, different types of panoramas can becaptured in surround views, depending on the type of movement used inthe capture process. In particular, dynamic panoramas, object panoramas,and selfie panoramas can be generated based on captured data. In someembodiments, the captured data can be recorded as described with regardto FIG. 7A.

FIGS. 7B-7F illustrate examples relating to dynamic panoramas that canbe created with surround views. With particular reference to FIG. 7B,shown is one example of a dynamic panorama capture process 720. In thepresent example, a user 722 moves capture device 724 along capturemotion 726. This capture motion 726 can include rotating, waving,translating, etc. the capture device 724. During this capture process, apanorama of scene 728 is generated and dynamic content within the sceneis kept. For instance, moving objects are preserved within the panoramaas dynamic content.

With reference to FIG. 7C, shown is a specific example of a dynamicpanorama capture process 730 where a capture device 732 is rotatedthrough an axis of rotation 734. In particular, capture device 732 isrotated about its center along an axis of rotation 734. This purerotation captures a panorama of scene 736. According to variousexamples, this type of panorama can provide a “flat” scene that capturesentities in the scene at a particular point in time. This “flat” scenecan be a two-dimensional image, or can be an image projected on acylinder, surface, etc.

With reference to FIG. 7D, shown is one example of a dynamic panorama740 with dynamic content 744. Once a panorama is captured, as describedabove with regard to FIGS. 7B-7C, a dynamic panorama 740 can benavigated by a user. In the present example, dynamic content 744 isanimated when the user navigates through the dynamic panorama 740. Forinstance, as the user swipes across scene 742, the dynamic content 744can be seen moving with respect to the scene 742.

With reference to FIG. 7E, shown is one example of capturing a dynamicpanorama with a 3D effect. In the present example, if a capture deviceis not rotated exactly around its camera center (as in FIG. 7C), a 3Deffect can be obtained by moving different parts of the panorama atdifferent speeds while the user navigates through the dynamic content.Although a nearby person or object 750 would create artifacts in astandard panorama capture process if the capture device is not rotatedaround its camera center (as in FIG. 7C), these “imperfections” can beused to create a 3D impression to the user by moving the object 750 at adifferent speed when swiping/navigating through a dynamic panorama. Inparticular, the capture device 745 shown uses a capture motion 748 thatcaptures a distant scene 746 and a nearby person/object 750. Themovements of the nearby person/object 750 can be captured as 3D motionwithin the surround view, while the distant scenery 746 appears to bestatic as the user navigates through the surround view, according tovarious embodiments.

With reference to FIG. 7F, shown is one example of a dynamic panorama750 with parallax effect. Three-dimensional effects can be presented byapplying a parallax effect when swiping perpendicular to the panoramadirection 752. In particular, when swiping perpendicular to the panoramadirection, along the parallax direction 754, nearby objects aredisplaced along the parallax direction 754 while the scene at distancestays still or moves less than the nearby objects.

FIGS. 7G-7J illustrate examples relating to object panoramas that can becreated with surround views. With reference to FIG. 7G, shown is oneexample of an object panorama capture process. In particular, a capturedevice 766 is moved around an object 762 along a capture motion 760. Oneparticular example of a capture device 766 is a smartphone. The capturedevice 766 also captures a panoramic view of the background 764 asvarious views and angles of the object 762 are captured. The resultingsurround view includes a panoramic view of object 762.

In some embodiments, a surround view can be created by projecting anobject panorama onto a background panorama, an example of which is shownin FIG. 7H. In particular, a panorama 768 of this kind is built usingbackground panorama 770 and projecting a foreground object panorama 772onto the background panorama 770. In some examples, an object panoramacan be segmented content taken from a surround view, as described inmore detail with regard to FIGS. 17A-17B.

According to various embodiments, multiple objects can make up an objectpanorama. With reference to FIG. 7I, shown is one example of a captureprocess for a group of objects 780 making up an object panorama. Asshown, a capture device 776 can move around a foreground object, whichcan be a single object or a group of objects 780 located at a similardistance to the capture device. The capture device 776 can move aroundthe object or group of objects 780 along a capture motion 778, such thatvarious views and angles of the objects are captured. The resultingsurround view can include an object panorama of the group of objects 780with distant background 782 as the context.

Object panoramas allow users to navigate around the object, according tovarious examples. With reference to FIG. 7J, shown is one example ofchanging the viewing angle of an object panorama based on usernavigation. In this example, three views are shown of a surround viewpanorama 784. In the surround view panorama, a foreground object 786 isshown in front of a background panorama 788. As a user navigates thepanorama by swiping or otherwise interacting with the surround view, thelocation of the object, the viewing angle of the object, or both can bechanged. In the present example, the user can swipe in the direction ofthe main panorama axis. This navigation can rotate the foreground object786 in this view. In some examples, the distant background panorama 788may not change as the foreground object panorama rotates or otherwisemoves.

According to various embodiments, object panoramas can also includeparallax effects. These parallax effects can be seen whenswiping/navigating perpendicular to the direction of the main panoramaaxis. Similar to FIG. 7F, three-dimensional effects can be presentedwhen swiping perpendicular to the panorama direction. In particular,when swiping perpendicular to the panorama direction, along the parallaxdirection, nearby objects are displaced along the parallax directionwhile the scene at distance stays still or moves less than the nearbyobjects.

Although the previous examples relate to static content and backgroundcontext in object panoramas, dynamic content can be integrated in theobject panorama for either or both the foreground object and thebackground context. For instance, dynamic content can be featured in amanner similar to that described in conjunction with FIG. 7D. Similarly,dynamic context can also be included in object panoramas.

Another type of panorama that can be included in surround views is aselfie panorama. In some examples, a selfie panorama can be segmentedcontent taken from a surround view, as described in more detail withregard to FIGS. 17A-17B. FIGS. 7K-7L illustrate examples relating toselfie panoramas that can be created with surround views. With referenceto FIG. 7K, shown is one example of a selfie panorama capture process790. In particular, a user 794 moves a capture device 792 along capturemotion 796 while capturing images of the user 794. In some examples, thecapture device 792 can use a front-facing camera, such as one includedon a smart phone. In other examples, a digital camera or other imagerecording device can be used. A selfie panorama is created with theseimages, with background 798 providing the context.

With reference to FIG. 7L, shown is one example of a background panoramawith a selfie panorama projected on it. In the present example, asurround view panorama 723 is built from a background panorama 725 witha selfie panorama 721 projected on it. According to various examples,the selfie panorama can include a single person or multiple people,similar to the object or group of objects described in conjunction withFIG. 7I. In the present example, selfie panoramas can include dynamiccontent. For instance, the user can look at the capture device as thecapture device moves or the user can keep still while moving the capturedevice. The user's movements can be captured while the selfie panorama721 is recorded. These dynamic elements will be mapped into the panoramaand can be displayed while interacting with the resulting selfiepanorama 721. For instance, the user's blinks can be recorded andcaptured. Navigation of the selfie panorama can be done in a mannersimilar to that described in conjunction with FIG. 7J. In particular,the location and viewpoint of the person(s) in the selfie panorama 721can be changed by the user by swiping/navigating in the direction of themain panorama axis. According to various embodiments, selfie panoramas721 can also include parallax effects. These parallax effects can beseen when swiping/navigating perpendicular to the direction of the mainpanorama axis. In addition, similar to FIG. 7F, three-dimensionaleffects can be presented when swiping perpendicular to the panoramadirection. In particular, when swiping perpendicular to the panoramadirection, along the parallax direction, nearby objects are displacedalong the parallax direction while the scene at distance stays still ormoves less than the nearby objects.

As described above, various types of panoramas can be created withsurround views. In addition, surround views can be viewed and navigatedin different ways. With reference to FIG. 7M, shown is one example ofextended views of panoramas that are provided based on user navigation.In the present example, possible views 727 include a full panorama view729, recording views 731, and extended view 733. A full panorama view729 includes a full view of the information in a surround view. Therecording views 731 include the visual data captured in images and/orrecordings. The extended view 733 shows more than what is visible duringone point in time in recording views 731 but less than the full panoramaview 729. The portion of the panorama 729 that is visible in an extendedview 733 is defined by user navigation. An extended view 733 isespecially interesting for a selfie or object panorama, because theextended view follows the object/person in the panorama and shows alarger view than what was visible for the camera while recording.Essentially, more context is provided to the user in an extended view733 during navigation of the surround view.

According to various embodiments, once a series of images is captured,these images can be used to generate a surround view. With reference toFIG. 8, shown is an example of a surround view in whichthree-dimensional content is blended with a two-dimensional panoramiccontext. In the present example embodiment, the movement of capturedevice 820 follows a locally convex motion, such that the capture devicemoves around the object of interest (i.e., a person sitting in a chair).The object of interest is delineated as the content 808, and thesurrounding scenery (i.e., the room) is delineated as the context 810.In the present embodiment, as the movement of the capture device 820moves leftwards around the content 808, the direction of contentrotation relative to the capture device 812 is in a rightward,counterclockwise direction. Views 802, 804, and 806 show a progressionof the rotation of the person sitting in a chair relative to the room.

According to various embodiments, a series of images used to generate asurround view can be captured by a user recording a scene, object ofinterest, etc. Additionally, in some examples, multiple users cancontribute to acquiring a series of images used to generate a surroundview. With reference to FIG. 9, shown is one example of a space-timesurround view being simultaneously recorded by independent observers.

In the present example embodiment, cameras 904, 906, 908, 910, 912, and914 are positioned at different locations. In some examples, thesecameras 904, 906, 908, 910, 912, and 914 can be associated withindependent observers. For instance, the independent observers could beaudience members at a concert, show, event, etc. In other examples,cameras 904, 906, 908, 910, 912, and 914 could be placed on tripods,stands, etc. In the present embodiment, the cameras 904, 906, 908, 910,912, and 914 are used to capture views 904 a, 906 a, 908 a, 910 a, 912a, and 914 a, respectively, of an object of interest 900, with world 902providing the background scenery. The images captured by cameras 904,906, 908, 910, 912, and 914 can be aggregated and used together in asingle surround view in some examples. Each of the cameras 904, 906,908, 910, 912, and 914 provides a different vantage point relative tothe object of interest 900, so aggregating the images from thesedifferent locations provides information about different viewing anglesof the object of interest 900. In addition, cameras 904, 906, 908, 910,912, and 914 can provide a series of images from their respectivelocations over a span of time, such that the surround view generatedfrom these series of images can include temporal information and canalso indicate movement over time.

As described above with regard to various embodiments, surround viewscan be associated with a variety of capture modes. In addition, asurround view can include different capture modes or different capturemotions in the same surround view. Accordingly, surround views can beseparated into smaller parts in some examples. With reference to FIG.10, shown is one example of separation of a complex surround-view intosmaller, linear parts. In the present example, complex surround view1000 includes a capture area 1026 that follows a sweeping L motion,which includes two separate linear motions 1022 and 1024 of camera 1010.The surround views associated with these separate linear motions can bebroken down into linear surround view 1002 and linear surround view1004. It should be noted that although linear motions 1022 and 1024 canbe captured sequentially and continuously in some embodiments, theselinear motions 1022 and 1024 can also be captured in separate sessionsin other embodiments.

In the present example embodiment, linear surround view 1002 and linearsurround view 1004 can be processed independently, and joined with atransition 1006 to provide a continuous experience for the user.Breaking down motion into smaller linear components in this manner canprovide various advantages. For instance, breaking down these smallerlinear components into discrete, loadable parts can aid in compressionof the data for bandwidth purposes. Similarly, non-linear surround viewscan also be separated into discrete components. In some examples,surround views can be broken down based on local capture motion. Forexample, a complex motion may be broken down into a locally convexportion and a linear portion. In another example, a complex motion canbe broken down into separate locally convex portions. It should berecognized that any number of motions can be included in a complexsurround view 1000, and that a complex surround view 1000 can be brokendown into any number of separate portions, depending on the application.

Although in some applications, it is desirable to separate complexsurround views, in other applications it is desirable to combinemultiple surround views. With reference to FIG. 11, shown is one exampleof a graph that includes multiple surround views combined into amulti-surround view 1100. In this example, the rectangles representvarious surround views 1102, 1104, 1106, 1108, 1110, 1112, 1114, and1116, and the length of each rectangle indicates the dominant motion ofeach surround view. Lines between the surround views indicate possibletransitions 1118, 1120, 1122, 1124, 1126, 1128, 1130, and 1132 betweenthem.

In some examples, a surround view can provide a way to partition a sceneboth spatially and temporally in a very efficient manner. For very largescale scenes, multi-surround view 1100 data can be used. In particular,a multi-surround view 1100 can include a collection of surround viewsthat are connected together in a spatial graph. The individual surroundviews can be collected by a single source, such as a single user, or bymultiple sources, such as multiple users. In addition, the individualsurround views can be captured in sequence, in parallel, or totallyuncorrelated at different times. However, in order to connect theindividual surround views, there must be some overlap of content,context, or location, or of a combination of these features.Accordingly, any two surround views would need to have some overlap incontent, context, and/or location to provide a portion of amulti-surround view 1100. Individual surround views can be linked to oneanother through this overlap and stitched together to form amulti-surround view 1100. According to various examples, any combinationof capture devices with either front, back, or front and back camerascan be used.

In some embodiments, multi-surround views 1100 can be generalized tomore fully capture entire environments. Much like “photo tours” collectphotographs into a graph of discrete, spatially-neighboring components,multiple surround views can be combined into an entire scene graph. Insome examples, this can be achieved using information obtained from butnot limited to: image matching/tracking, depth matching/tracking, IMU,user input, and/or GPS. Within such a graph or multi-surround view, auser can switch between different surround views either at the endpoints of the recorded motion or wherever there is an overlap with othersurround views in the graph. One advantage of multi-surround views over“photo tours” is that a user can navigate the surround views as desiredand much more visual information can be stored in surround views. Incontrast, traditional “photo tours” typically have limited views thatcan be shown to the viewer either automatically or by allowing the userto pan through a panorama with a computer mouse or keystrokes.

According to various embodiments, a surround view is generated from aset of images. These images can be captured by a user intending toproduce a surround view or retrieved from storage, depending on theapplication. Because a surround view is not limited or restricted withrespect to a certain amount of visibility, it can provide significantlymore visual information about different views of an object or scene.More specifically, although a single viewpoint may be ambiguous toadequately describe a three-dimensional object, multiple views of theobject can provide more specific and detailed information. Thesemultiple views can provide enough information to allow a visual searchquery to yield more accurate search results. Because a surround viewprovides views from many sides of an object, distinctive views that areappropriate for search can be selected from the surround view orrequested from a user if a distinctive view is not available. Forinstance, if the data captured or otherwise provided is not sufficientto allow recognition or generation of the object or scene of interestwith a sufficiently high certainty, a capturing system can guide a userto continue moving the capturing device or provide additional imagedata. In particular embodiments, if a surround view is determined toneed additional views to produce a more accurate model, a user may beprompted to provide additional images.

With reference to FIG. 12, shown is one example of a process forprompting a user for additional images 1200 to provide a more accuratesurround view. In the present example, images are received from acapturing device or storage at 1202. Next, a determination is madewhether the images provided are sufficient to allow recognition of anobject of interest at 1204. If the images are not sufficient to allowrecognition of an object of interest, then a prompt is given for theuser to provide additional image(s) from different viewing angles at1206. In some examples, prompting a user to provide one or moreadditional images from different viewing angles can include suggestingone or more particular viewing angles. If the user is actively capturingimages, the user can be prompted when a distinct viewing angle isdetected in some instances. According to various embodiments,suggestions to provide one or more particular viewing angles can bedetermined based on the locations associated with the images alreadyreceived. In addition, prompting a user to provide one or moreadditional images from different viewing angles can include suggestingusing a particular capture mode such as a locally concave surround view,a locally convex surround view, or a locally flat surround view,depending on the application.

Next, the system receives these additional image(s) from the user at1208. Once the additional images are received, a determination is madeagain whether the images are sufficient to allow recognition of anobject of interest. This process continues until a determination is madethat the images are sufficient to allow recognition of an object ofinterest. In some embodiments, the process can end at this point and asurround view can be generated.

Optionally, once a determination is made that the images are sufficientto allow recognition of an object of interest, then a determination canthen be made whether the images are sufficient to distinguish the objectof interest from similar but non-matching items at 1210. Thisdetermination can be helpful especially when using visual search,examples of which are described in more detail below with regards toFIGS. 19-22. In particular, an object of interest may havedistinguishing features that can be seen from particular angles thatrequire additional views. For instance, a portrait of a person may notsufficiently show the person's hairstyle if only pictures are taken fromthe front angles. Additional pictures of the back of the person may needto be provided to determine whether the person has short hair or just apulled-back hairstyle. In another example, a picture of a person wearinga shirt might warrant additional prompting if it is plain on one sideand additional views would show prints or other insignia on the sleevesor back, etc.

In some examples, determining that the images are not sufficient todistinguish the object of interest from similar but non-matching itemsincludes determining that the number of matching search results exceedsa predetermined threshold. In particular, if a large number of searchresults are found, then it can be determined that additional views maybe needed to narrow the search criteria. For instance, if a search of amug yields a large number of matches, such as more than 20, thenadditional views of the mug may be needed to prune the search results.

If the images are not sufficient to distinguish the object of interestfrom similar but non-matching items at 1210, then a prompt is given forthe user to provide additional image(s) from different viewing angles at1212. In some examples, prompting a user to provide one or moreadditional images from different viewing angles can include suggestingone or more particular viewing angles. If the user is actively capturingimages, the user can be prompted when a distinct viewing angle isdetected in some instances. According to various embodiments,suggestions to provide one or more particular viewing angles can bedetermined based on the locations associated with the images alreadyreceived. In addition, prompting a user to provide one or moreadditional images from different viewing angles can include suggestingusing a particular capture mode such as a locally concave surround view,a locally convex surround view, or a locally flat surround view,depending on the application.

Next, the system receives these additional image(s) from the user at1214. Once the additional images are received, a determination is madeagain whether the images are sufficient to distinguish the object ofinterest from similar but non-matching items. This process continuesuntil a determination is made that the images are sufficient todistinguish the object of interest from similar but non-matching items.Next, the process ends and a surround view can be generated from theimages.

With reference to FIGS. 13A-13B, shown are examples of promptsrequesting additional images from a user in order to produce a moreaccurate surround view. In particular, a device 1300 is shown with asearch screen. In FIG. 13A, an example of a visual search query 1302 isprovided. This visual search query 1302 includes an image of a whitemug. The results 1306 include various mugs with a white background. Inparticular embodiments, if a large amount of search results is found, aprompt 1304 can be provided to request additional image data from theuser for the search query.

In FIG. 13B, an example of another visual search query 1310 is providedin response to prompt 1304 in FIG. 13A. This visual search query 1310provides a different viewpoint of the object and provides more specificinformation about the graphics on the mug. This visual search query 1310yields new results 1312 that are more targeted and accurate. In someexamples, an additional prompt 1308 can be provided to notify the userthat the search is complete.

Once a surround view is generated, it can be used in variousapplications, in particular embodiments. One application for a surroundview includes allowing a user to navigate a surround view or otherwiseinteract with it. According to various embodiments, a surround view isdesigned to simulate the feeling of being physically present in a sceneas the user interacts with the surround view. This experience dependsnot only on the viewing angle of the camera, but on the type of surroundview that is being viewed. Although a surround view does not need tohave a specific fixed geometry overall, different types of geometriescan be represented over a local segment of a surround view such as aconcave, convex, and flat surround view, in particular embodiments.

In particular example embodiments, the mode of navigation is informed bythe type of geometry represented in a surround view. For instance, withconcave surround views, the act of rotating a device (such as asmartphone, etc.) can mimic that of rotating a stationary observer whois looking out at a surrounding scene. In some applications, swiping thescreen in one direction can cause the view to rotate in the oppositedirection. This effect is akin to having a user stand inside a hollowcylinder and pushing its walls to rotate around the user. In otherexamples with convex surround views, rotating the device can cause theview to orbit in the direction it is leaning into, such that the objectof interest remains centered. In some applications, swiping the screenin one direction causes the viewing angle to rotate in the samedirection: this creates the sensation of rotating the object of interestabout its axis or having the user rotate around the object. In someexamples with flat views, rotating or moving a device can cause the viewto translate in the direction of the device's movement. In addition,swiping the screen in one direction can cause the view to translate inthe opposite direction, as if pushing foreground objects to the side.

In some examples, a user may be able to navigate a multi-surround viewor a graph of surround views in which individual surround views can beloaded piece by piece and further surround views may be loaded whennecessary (e.g. when they are adjacent to/overlap the current surroundview and/or the user navigates towards them). If the user reaches apoint in a surround view where two or more surround views overlap, theuser can select which of those overlapping surround views to follow. Insome instances, the selection of which surround view to follow can bebased on the direction the user swipes or moves the device.

With reference to FIG. 14, shown is one example of a process fornavigating a surround view 1400. In the present example, a request isreceived from a user to view an object of interest in a surround view at1402. According to various embodiments, the request can also be ageneric request to view a surround view without a particular object ofinterest, such as when viewing a landscape or panoramic view. Next, athree-dimensional model of the object is accessed at 1404. Thisthree-dimensional model can include all or a portion of a storedsurround view. For instance, the three-dimensional model can be asegmented content view in some applications. An initial image is thensent from a first viewpoint to an output device at 1406. This firstviewpoint serves as a starting point for viewing the surround view onthe output device.

In the present embodiment, a user action is then received to view theobject of interest from a second viewpoint. This user action can includemoving (e.g. tilting, translating, rotating, etc.) an input device,swiping the screen, etc., depending on the application. For instance,the user action can correspond to motion associated with a locallyconcave surround view, a locally convex surround view, or a locally flatsurround view, etc. According to various embodiments, an object view canbe rotated about an axis by rotating a device about the same axis. Forexample, the object view can be rotated along a vertical axis byrotating the device about the vertical axis. Based on thecharacteristics of the user action, the three-dimensional model isprocessed at 1410. For instance, movement of the input device can bedetected and a corresponding viewpoint of the object of interest can befound. Depending on the application, the input device and output devicecan both be included in a mobile device, etc. In some examples, therequested image corresponds to an image captured prior to generation ofthe surround view. In other examples the requested image is generatedbased on the three-dimensional model (e.g. by interpolation, etc.). Animage from this viewpoint can be sent to the output device at 1412. Insome embodiments, the selected image can be provided to the outputdevice along with a degree of certainty as to the accuracy of theselected image. For instance, when interpolation algorithms are used togenerate an image from a particular viewpoint, the degree of certaintycan vary and may be provided to a user in some applications. In otherexamples, a message can be provided to the output device indicating ifthere is insufficient information in the surround view to provide therequested images.

In some embodiments, intermediate images can be sent between the initialimage at 1406 and the requested image at 1412. In particular, theseintermediate images can correspond to viewpoints located between a firstviewpoint associated with the initial image and a second viewpointassociated with the requested image. Furthermore, these intermediateimages can be selected based on the characteristics of the user action.For instance, the intermediate images can follow the path of movement ofthe input device associated with the user action, such that theintermediate images provide a visual navigation of the object ofinterest.

With reference to FIG. 15, shown is an example of swipe-based navigationof a surround view. In the present example, three views of device 1500are shown as a user navigates a surround view. In particular, the input1510 is a swipe by the user on the screen of device 1500. As the userswipes from right to left, the object of interest moves relative to thedirection of swipe 1508. Specifically, as shown by the progression ofimages 1506, 1504, and 1502, the input 1510 allows the user to rotatearound the object of interest (i.e., the man wearing sunglasses).

In the present example, a swipe on a device screen can correspond torotation of a virtual view. However, other input modes can be used inother example embodiments. For instance, a surround view can also benavigated by tilting a device in various directions and using the deviceorientation direction to guide the navigation in the surround view. Inanother example, the navigation can also be based on movement of thescreen by the user. Accordingly, a sweeping motion can allow the user tosee around the surround view as if the viewer were pointing the deviceat the object of interest. In yet another example, a website can be usedto provide interaction with the surround view in a web-browser. In thisexample, swipe and/or motion sensors may be unavailable, and can bereplaced by interaction with a mouse or other cursor or input device.

According to various embodiments, surround views can also includetagging that can be viewed during navigation. Tagging can provideidentification for objects, people, products, or other items within asurround view. In particular, tagging in a surround view is a verypowerful tool for presenting products to users/customers and promotingthose elements or items. In one example, a tag 1512 can follow thelocation of the item that is tagged, such that the item can be viewedfrom different angles while the tag locations still stay valid. The tags1512 can store different types of data, such as a name (e.g. user name,product name, etc.), a description, a link to a website/webshop, priceinformation, a direct option for purchasing a tagged object, a list ofsimilar objects, etc. In some examples, the tags can become visible whena user selects an item in a surround view. In other examples, the tagscan be automatically displayed. In addition, additional information canbe accessed by selecting a tag 1512 in some applications. For instance,when a user selects a tag, additional information can be displayed onscreen such as a description, link, etc.

In some embodiments, a user can create a tag 1512 by selecting either apoint or a region in one viewpoint of a surround view. This point orregion is then automatically propagated into other viewpoints.Alternatively, tag locations can be automatically suggested to the userby an application based on different information, such as facedetection, object detection, objects in focus, objects that areidentified as foreground, etc. In some examples, object detection can bemade from a database of known objects or object types/classes.

In the present example, tag 1512 identifies a shirt in the surroundview. Of course, any text or title can be included, such as a name,brand, etc. This tag 1512 can be mapped to a particular location in thesurround view such that the tag is associated with the same location orpoint in any view selected. As described above, tag 1512 can includeadditional information that can be accessed by tapping or otherwiseselecting the tag, in some embodiments. Although tagging is shown inFIG. 15, it should be noted that surround views may not include taggingin some examples.

According to various embodiments, surround views can be stored andaccessed in various ways. In addition, surround views can be used inmany applications. With reference to FIG. 16A, shown are examples of asharing service for surround views on a mobile device 1602 and browser1604. The mobile device 1602 and browser 1604 are shown as alternatethumbnail displays 1600, because the surround views can be accessed byeither interface, depending on the application. According to variousembodiments, a set of surround views can be presented to a user indifferent ways, including but not limited to: a gallery, a feed, and/ora website. For instance, a gallery can be used to present a collectionof thumbnails to a user. These thumbnails can be selected from thesurround views either by the user or automatically. In some examples,the size of the thumbnails can vary based on characteristics such as,but not limited to: an automatically selected size that is based on thestructure and size of the content it contains; and/or the popularity ofthe surround view. In another example, a feed can be used to presentsurround views using interactive thumbnails.

In the present example, surround view thumbnails from a mobile device1602 include thumbnails 1604 and title/label/description 1604. Thethumbnails 1604 can include an image from the surround view. Thetitle/label/description 1604 can include information about the surroundview such as title, file name, description of the content, labels, tags,etc.

Furthermore, in the present example, surround view thumbnails from abrowser 1604 include thumbnails 1606, title/label/description 1608, andnotifications 1610. The thumbnails 1606 can include an image from thesurround view. The title/label/description 1608 can include informationabout the surround view such as title, file name, description of thecontent, labels, tags, etc. In addition, notifications 1610 can includeinformation such as comments on a surround view, updates about matchingcontent, suggested content, etc. Although not shown on the mobileversion, notifications can also be included, but may be omitted in theinterest of layout and space considerations in some embodiments. In someexamples, notifications can be provided as part of a surround viewapplication on a mobile device.

With reference to FIG. 16B, shown are examples of surround view-relatednotifications on a mobile device. In particular, alternativenotification screens 1620 for a device 1622 are shown that includedifferent formats for notifications. In some examples, a user cannavigate between these screens depending on the user's preferences.

In the present example, screen 1624 includes a notification 1626 thatincludes a recommendation to the user based on content from recentsurround views. In particular, the recommendation relates to a trip toGreece based on the application's finding that the user has an affinityfor statues. This finding can be inferred from content found in theuser's stored or recently browsed surround views, in some examples.

In the present example, screen 1628 includes notifications 1630 based oncontent from surround views that the user has stored, browsed, etc. Forinstance, one notification is a recommendation for a pair of shoesavailable at a nearby retailer that are similar to the user's shoes asprovided in a surround view model. The recommendation also includes alink to a map to the retailer. This recommendation can be based on asurround view that the user has saved of a pair of shoes. The othernotification is a recommendation to connect to another user that sharesa common interest/hobby. In this example, the recommendation is based onthe user's detected interest in hats. These recommendations can beprovided automatically in some applications as “push” notifications. Thecontent of the recommendations can be based on the user's surround viewsor browsing history, and visual search algorithms, such as thosedescribed with regard to FIGS. 19-22, can be used in some examples.

Screen 1630 shows another form of notification 1632 in the presentexample. Various icons for different applications are featured on screen1630. The icon for the surround view application includes a notification1632 embedded into the icon that shows how many notifications arewaiting for the user. When the user selects the icon, the notificationscan be displayed and/or the application can be launched, according tovarious embodiments.

According to various embodiments of the present disclosure, surroundviews can be used to segment, or separate, objects from static ordynamic scenes. Because surround views include distinctive 3D modelingcharacteristics and information derived from image data, surround viewsprovide a unique opportunity for segmentation. In some examples, bytreating an object of interest as the surround view content, andexpressing the remaining of the scene as the context, the object can besegmented out and treated as a separate entity. Additionally, thesurround view context can be used to refine the segmentation process insome instances. In various embodiments, the content can be chosen eitherautomatically or semi-automatically using user guided interaction. Oneimportant use for surround view object segmentation is in the context ofproduct showcases in e-commerce, an example of which is shown in FIG.17B. In addition, surround view-based object segmentation can be used togenerate object models that are suited for training artificialintelligence search algorithms that can operate on large databases, inthe context of visual search applications.

With reference to FIG. 17, shown is one example of a process forproviding object segmentation 1700. At 1702, a first surround view of anobject is obtained. Next, content is selected from the first surroundview at 1704. In some examples, the content is selected automaticallywithout user input. In other examples, the content is selectedsemi-automatically using user-guided interaction. The content is thensegmented from the first surround view at 1706. In some examples, thecontent is segmented by reconstructing a model of the content inthree-dimensions based on the information provided in the first surroundview, including images from multiple camera viewpoints. In particularexample embodiments, a mechanism for selecting and initializing asegmentation algorithm based on iterative optimization algorithms (suchas graphical models) can be efficiently employed by reconstructing theobject of interest, or parts of it, in three-dimensions from multiplecamera viewpoints available in a surround view. This process can berepeated over multiple frames, and optimized until segmentation reachesa desired quality output. In addition, segmenting the content caninclude using the context to determine parameters of the content.

In the present example, once the content is segmented from the firstsurround view, a second surround view is generated that includes theobject without the content or scenery surrounding the object. At 1708,this second surround view is provided. In some examples, the secondsurround view can then be stored in a database. This second surroundview can be used in various applications. For instance, the segmentedcontent includes a product for use in e-commerce. As illustrated in FIG.17B, the segmented content can be used to show a product from variousviewpoints. Another application includes using the second surround viewas an object model for artificial intelligence training. In yet anotherapplication, the second surround view can be used in 3D printing. Inthis application, data from the second surround view is to a 3D printer.

Although the present example describes segmenting out content from afirst surround view, it should be noted that context can also besegmented out in other examples. For instance, the background scenerycan be segmented out and presented as a second surround view in someapplications. In particular, the context can be selected from the firstsurround view and the context can be segmented from the first surroundview, such that the context is separated into a distinct interactivemodel. The resulting surround view would then include the scenerysurrounding an object but exclude the object itself. A segmented contextmodel can also be used in various applications. For instance, data fromthe resulting surround view can be sent to a 3D printer. In someexamples, this could be printed as a panoramic background on a flat orcurved surface. If a context model is also printed, then the object ofinterest can be placed in front of the panoramic background to produce athree-dimensional “photograph” or model of the surround view. In anotherapplication, the segmented out context can be used as background to adifferent object of interest. Alternatively, a segmented out content canbe placed in a new segmented out context. In these examples, providingan alternative content or context allows objects of interest to beplaced into new backgrounds, etc. For instance, a surround view of aperson could be placed in various background contexts, showing theperson standing on a beach in one surround view, and standing in thesnow in another surround view.

With reference to FIG. 17B, shown is one example of a segmented objectviewed from different angles. In particular, a rotational view 1720 isshown of an athletic shoe. Object views 1722, 1724, 1726, 1728, and 1730show the athletic shoe from various angles or viewpoints. As shown, theobject itself is shown without any background or context. According tovarious embodiments, these different views of the segmented object canbe automatically obtained from surround view content. One application ofthese types of rotational views is in e-commerce to show product viewsfrom different angles. Another application can be in visual search,according to various embodiments.

According to various embodiments, surround views can be generated fromdata obtained from various sources and can be used in numerousapplications. With reference to FIG. 18, shown is a block diagramillustrating one example of various sources that can be used forsurround view generation and various applications that can be used witha surround view. In the present example, surround view generation andapplications 1800 includes sources for image data 1808 such as internetgalleries 1802, repositories 1804, and users 1806. In particular, therepositories can include databases, hard drives, storage devices, etc.In addition, users 1806 can include images and information obtaineddirectly from users such as during image capture on a smartphone, etc.Although these particular examples of data sources are indicated, datacan be obtained from other sources as well. This information can begathered as image data 1808 to generate a surround view 1810, inparticular embodiments.

In the present example, a surround view 1810 can be used in variousapplications. As shown, a surround view can be used in applications suchas e-commerce 1812, visual search 1814, 3D printing 1816, file sharing1818, user interaction 1820, and entertainment 1822. Of course, thislist is only illustrative, and surround views can also be used in otherapplications not explicitly noted.

As described above with regard to segmentation, surround views can beused in e-commerce 1812. For instance, surround views can be used toallow shoppers to view a product from various angles. In someapplications, shoppers can even use surround views to determine sizing,dimensions, and fit. In particular, a shopper can provide a self-modeland determine from surround views whether the product would fit themodel. Surround views can also be used in visual search 1814 asdescribed in more detail below with regard to FIGS. 19-22. Some of thevisual search applications can also relate to e-commerce, such as when auser is trying to find a particular product that matches a visual searchquery.

Another application of segmentation includes three-dimensional printing(3D printing) 1816. Three-dimensional printing has been recentlyidentified as one of the future disruptive technologies that willimprove the global economy in the next decade. According to variousembodiments, content can be 3D printed from a surround view. Inaddition, the panoramic background context in a surround view can alsobe printed. In some examples, a printed background context cancomplement the final 3D printed product for users that would like topreserve memories in a 3D printed format. For instance, the contextcould be printed either as a flat plane sitting behind the 3D content,or as any other geometric shape (spherical, cylindrical, U shape, etc).

As described above with regard to FIG. 16A, surround views can be storedwith thumbnail views for user access. This type of application can beused for file sharing 1818 between users in some examples. For instance,a site can include infrastructure for users to share surround views in amanner similar to current photo sharing sites. File sharing 1818 canalso be implemented directly between users in some applications.

Also as described with regard to FIGS. 14 and 15, user interaction isanother application of surround views. In particular, a user cannavigate through a surround view for their own pleasure orentertainment. Extending this concept to entertainment 1822, surroundviews can be used in numerous ways. For instance, surround views can beused in advertisements, videos, etc.

As previously described, one application of surround views is visualsearch. FIGS. 19, 20, and 22 depict examples of visual search usingsurround views. According to various embodiments, using surround viewscan provide much higher discriminative power in search results than anyother digital media representation to date. In particular, the abilityto separate content and context in a surround view is an importantaspect that can be used in visual search.

Existing digital media formats such as 2D images are unsuitable forindexing, in the sense that they do not have enough discriminativeinformation available natively. As a result, many billions of dollarsare spent in research on algorithms and mechanisms for extracting suchinformation from them. This has resulted in satisfactory results forsome problems, such as facial recognition, but in general the problem offiguring out a 3D shape from a single image is ill-posed in existingtechnologies. Although the level of false positives and negatives can bereduced by using sequences of images or 2D videos, the 3D spatialreconstruction methods previously available are still inadequate.

According to various embodiments, additional data sources such aslocation-based information, which are used to generate surround views,provide valuable information that improves the capability of visualrecognition and search. In particular example embodiments, twocomponents of a surround view, the context and the content, bothcontribute significantly in the visual recognition process. Inparticular example embodiments, the availability of three-dimensionalinformation that the content offers can significantly reduce the numberof hypotheses that must be evaluated to recognize a query object or partof a scene. According to various embodiments, the content'sthree-dimensional information can help with categorization (i.e.,figuring out the general category that an object belongs to), and thetwo-dimensional texture information can indicate more about a specificinstance of the object. In many cases, the context information in asurround view can also aid in the categorization of a query object, byexplaining the type of scene in which the query object is located.

In addition to providing information that can be used to find a specificinstance of an object, surround views are also natively suited foranswering questions such as: “what other objects are similar in shapeand appearance?” Similar to the top-N best matches provided in responseto a web search query, a surround view can be used with objectcategorization and recognition algorithms to indicate the “closestmatches,” in various examples.

Visual search using surround views can be used and/or implemented invarious ways. In one example, visual search using surround views can beused in object recognition for robotics. In another example, visualsearch using surround views can be used in social media curation. Inparticular, by analyzing the surround view data being posted to varioussocial networks, and recognizing objects and parts of scenes, better#hashtags indices can be automatically generated. By generating thistype of information, feeds can be curated and the search experience canbe enhanced.

Another example in which visual search using surround views can be usedis in a shopping context that can be referred to as “Search and Shop.”In particular, this visual search can allow recognition of items thatare similar in shape and appearance, but might be sold at differentprices in other stores nearby. For instance, with reference to FIG. 21,a visual search query may yield similar products available for purchase.

In yet another example in which visual search using surround views canbe used is in a shopping context that can be referred to as “Search andFit.” According to various embodiments, because surround view content isthree-dimensional, precise measurements can be extracted and thisinformation can be used to determine whether a particular objectrepresented in a surround view would fit in a certain context (e.g., ashoe fitting a foot, a lamp fitting a room, etc).

In another instance, visual search using surround views can also be usedto provide better marketing recommendation engines. For example, byanalyzing the types of objects that appear in surround views generatedby various users, questions such as “what type of products do peoplereally use in their daily lives” can be answered in a natural, private,and non-intrusive way. Gathering this type of information can facilitateimproved recommendation engines, decrease and/or stop unwanted spam ormarketing ads, thereby increasing the quality of life of most users.FIG. 16B shows one implementation in which recommendations can beprovided according to various embodiments of the present invention.

With reference to FIG. 19, shown is one example of a process forproviding visual search of an object 1900, where the search queryincludes a surround view of the object and the data searched includesthree-dimensional models. At 1902, a visual search query that includes afirst surround view is received. This first surround view is thencompared to stored surround views at 1904. In some embodiments, thiscomparison can include extracting first measurement information for theobject in the first surround view and comparing it to second measurementinformation extracted from the one or more stored surround views. Forinstance, this type of measurement information can be used for searchingitems such as clothing, shoes, or accessories.

Next, a determination is made whether any stored surround viewscorrespond to the first surround view at 1906. In some examples, thisdetermination is based on whether the subject matter in any of thestored surround views is similar in shape to the object in the firstsurround view. In other examples, this determination is based on whetherany of the subject matter in the stored surround views is similar inappearance to the object in the first surround view. In yet otherexamples, this determination is based on whether any subject matter inthe stored surround views include similar textures included in the firstsurround view. In some instances, this determination is based on whetherany of the contexts associated with the stored surround views match thecontext of the first surround view. In another example, thisdetermination is based on whether the measurement information associatedwith a stored surround view dimensionally fits the object associatedwith the first surround view. Of course any of these bases can be usedin conjunction with each other.

Once this determination is made, a ranked list of matching results isgenerated at 1908. In some embodiments, generating a ranked list ofmatching results includes indicating how closely any of the storedsurround views dimensionally fits the object associated with the firstmeasurement information. According to various embodiments, this rankedlist can include displaying thumbnails of matching results. In someexamples, links to retailers can be included with the thumbnails.Additionally, information about the matching results such as name,brand, price, sources, etc. can be included in some applications.

Although the previous example includes using a surround view as a visualsearch query to search through stored surround views orthree-dimensional models, current infrastructure still includes a vaststore of two-dimensional images. For instance, the internet providesaccess to numerous two-dimensional images that are easily accessible.Accordingly, using a surround view to search through storedtwo-dimensional images for matches can provide a useful application ofsurround views with the current two-dimensional infrastructure.

With reference to FIG. 20, shown is one example of a process forproviding visual search of an object 2000, where the search queryincludes a surround view of the object and the data searched includestwo-dimensional images. At 2002, a visual search query that includes afirst surround view is received. Next, object view(s) are selected fromthe surround view at 2004. In particular, one or more two-dimensionalimages are selected from the surround view. Because these object view(s)will be compared to two-dimensional stored images, selecting multipleviews can increase the odds of finding a match. Furthermore, selectingone or more object views from the surround view can include selectingobject views that provide recognition of distinctive characteristics ofthe object.

In the present example, the object view(s) are then compared to storedimages at 2006. In some embodiments, one or more of the stored imagescan be extracted from stored surround views. These stored surround viewscan be retrieved from a database in some examples. In various examples,comparing the one or more object views to the stored images includescomparing the shape of the object in the surround view to the storedimages. In other examples, comparing the one or more object views to thestored images includes comparing the appearance of the object in thesurround view to the stored images. Furthermore, comparing the one ormore object views to the stored images can include comparing the textureof the object in the surround view to the stored images. In someembodiments, comparing the one or more object views to the stored imagesincludes comparing the context of the object in the surround view to thestored images. Of course any of these criteria for comparison can beused in conjunction with each other.

Next, a determination is made whether any stored images correspond tothe object view(s) at 2008. Once this determination is made, a rankedlist of matching results is generated at 2010. According to variousembodiments, this ranked list can include displaying thumbnails ofmatching results. In some examples, links to retailers can be includedwith the thumbnails. Additionally, information about the matchingresults such as name, brand, price, sources, etc. can be included insome applications.

With reference to FIG. 21, shown is an example of a visual searchprocess 2100. In the present example, images are obtained at 2102. Theseimages can be captured by a user or pulled from stored files. Next,according to various embodiments, a surround view is generated based onthe images. This surround view is then used as a visual search querythat is submitted at 2104. In this example, a surround view can be usedto answer questions such as “which other objects in a database look likethe query object.” As illustrated, surround views can help shift thevisual search paradigm from finding other “images that look like thequery,” to finding other “objects that look like the query,” due totheir better semantic information capabilities. As described with regardto FIGS. 19 and 20 above, the surround view can then be compared to thestored surround views or images and a list of matching results can beprovided at 2106.

Although the previous examples of visual search include using surroundviews as search queries, it may also be useful to provide search queriesfor two-dimensional images in some embodiments. With reference to FIG.22, shown is an example of a process for providing visual search of anobject 2200, where the search query includes a two-dimensional view ofthe object and the data searched includes surround view(s). At 2202, avisual search query that includes a two-dimensional view of an object tobe searched is received. In some examples, the two-dimensional view isobtained from an object surround view, wherein the object surround viewincludes a three-dimensional model of the object. Next, thetwo-dimensional view is compared to surround views at 2204. In someexamples, the two-dimensional view can be compared to one or morecontent views in the surround views. In particular, the two-dimensionalview can be compared to one or more two-dimensional images extractedfrom the surround views from different viewing angles. According tovarious examples, the two-dimensional images extracted from the surroundviews correspond to viewing angles that provide recognition ofdistinctive characteristics of the content. In other examples, comparingthe two-dimensional view to one or more surround views includescomparing the two-dimensional view to one or more content models.Various criteria can be used to compare the images or models such as theshape, appearance, texture, and context of the object. Of course any ofthese criteria for comparison can be used in conjunction with eachother.

With reference to FIG. 23, shown is a particular example of a computersystem that can be used to implement particular examples of the presentinvention. For instance, the computer system 2300 can be used to providesurround views according to various embodiments described above.According to particular example embodiments, a system 2300 suitable forimplementing particular embodiments of the present invention includes aprocessor 2301, a memory 2303, an interface 2311, and a bus 2315 (e.g.,a PCI bus). The interface 2311 may include separate input and outputinterfaces, or may be a unified interface supporting both operations.When acting under the control of appropriate software or firmware, theprocessor 2301 is responsible for such tasks such as optimization.Various specially configured devices can also be used in place of aprocessor 2301 or in addition to processor 2301. The completeimplementation can also be done in custom hardware. The interface 2311is typically configured to send and receive data packets or datasegments over a network. Particular examples of interfaces the devicesupports include Ethernet interfaces, frame relay interfaces, cableinterfaces, DSL interfaces, token ring interfaces, and the like.

In addition, various very high-speed interfaces may be provided such asfast Ethernet interfaces, Gigabit Ethernet interfaces, ATM interfaces,HSSI interfaces, POS interfaces, FDDI interfaces and the like.Generally, these interfaces may include ports appropriate forcommunication with the appropriate media. In some cases, they may alsoinclude an independent processor and, in some instances, volatile RAM.The independent processors may control such communications intensivetasks as packet switching, media control and management.

According to particular example embodiments, the system 2300 uses memory2303 to store data and program instructions and maintained a local sidecache. The program instructions may control the operation of anoperating system and/or one or more applications, for example. Thememory or memories may also be configured to store received metadata andbatch requested metadata.

Because such information and program instructions may be employed toimplement the systems/methods described herein, the present inventionrelates to tangible, machine readable media that include programinstructions, state information, etc. for performing various operationsdescribed herein. Examples of machine-readable media include hard disks,floppy disks, magnetic tape, optical media such as CD-ROM disks andDVDs; magneto-optical media such as optical disks, and hardware devicesthat are specially configured to store and perform program instructions,such as read-only memory devices (ROM) and programmable read-only memorydevices (PROMs). Examples of program instructions include both machinecode, such as produced by a compiler, and files containing higher levelcode that may be executed by the computer using an interpreter.

Although many of the components and processes are described above in thesingular for convenience, it will be appreciated by one of skill in theart that multiple components and repeated processes can also be used topractice the techniques of the present disclosure.

While the present disclosure has been particularly shown and describedwith reference to specific embodiments thereof, it will be understood bythose skilled in the art that changes in the form and details of thedisclosed embodiments may be made without departing from the spirit orscope of the invention. It is therefore intended that the invention beinterpreted to include all variations and equivalents that fall withinthe true spirit and scope of the present invention.

What is claimed is:
 1. A method comprising: obtaining a first surroundview of an object, the first surround view corresponding to a multi-viewinteractive digital media representation of the object, wherein thefirst surround view includes a first layer and a second layer, whereinthe first layer includes a first depth and the second layer includes asecond depth; selecting the first layer from the first surround view,wherein selecting the first layer includes selecting data within thefirst depth; applying an effect to the first layer within the firstsurround view to produce a modified first layer; and generating a secondsurround view including the modified first layer and the second layer.2. The method of claim 1, wherein the first surround view furtherincludes a third layer, the third layer including a third depth.
 3. Themethod of claim 2, wherein the second surround view is generated usingthe modified first layer, the second layer, and the third layer.
 4. Themethod of claim 1, wherein the first layer includes an object.
 5. Themethod of claim 4, wherein the object is a person.
 6. The method ofclaim 1, wherein the first layer includes dynamic scene elements.
 7. Themethod of claim 1, wherein the effect is a blurring filter.
 8. Themethod of claim 7, wherein the effect is a gray scale filter.
 9. Themethod of claim 1, wherein the effect includes moving the first layer ata first speed, wherein the second layer is moved at a second speed, andwherein the first speed is different from the second speed.
 10. Themethod of claim 1, wherein the first surround view includes a thirdlayer, and wherein the second surround view omits the third layer. 11.The method of claim 1, wherein selecting the first layer is performedautomatically without user input.
 12. The method of claim 1, wherein theobject is display on a device, wherein the viewing angle of the objectis manipulated by tilting and rotating the device.
 13. The method ofclaim 12, wherein the viewing angle of the object is manipulated byrotating the device along an axis corresponding to the same axis of theobject.
 14. The method of claim 1, wherein the object is display on adevice, wherein the viewing angle of the object is manipulated byswiping over the display.
 15. A computer readable medium comprising:computer code for obtaining a first surround view of an object, thefirst surround view corresponding to a multi-view interactive digitalmedia representation of the object, wherein the first surround viewincludes a first layer and a second layer, wherein the first layerincludes a first depth and the second layer includes a second depth;computer code for selecting the first layer from the first surroundview, wherein selecting the first layer includes selecting data withinthe first depth; computer code for applying an effect to the first layerwithin the first surround view to produce a modified first layer; andcomputer code for generating a second surround view including themodified first layer and the second layer.
 16. The computer readablemedium of claim 15, wherein the first surround view further includes athird layer, the third layer including a third depth.
 17. The computerreadable medium of claim 16, wherein the second surround view isgenerated using the modified first layer, the second layer, and thethird layer.
 18. The computer readable medium of claim 15, wherein thefirst layer includes an object.
 19. The computer readable medium ofclaim 15, wherein the first layer includes dynamic scene elements.
 20. Asystem comprising: means for obtaining a first surround view of anobject, the first surround view corresponding to a multi-viewinteractive digital media representation of the object, wherein thefirst surround view includes a first layer and a second layer, whereinthe first layer includes a first depth and the second layer includes asecond depth; means for selecting the first layer from the firstsurround view, wherein selecting the first layer includes selecting datawithin the first depth; means for applying an effect to the first layerwithin the first surround view to produce a modified first layer; andmeans for generating a second surround view including the modified firstlayer and the second layer.